A Multi-Strategy Approach for Location Mining in Tweets: AUT NLP Group Entry for ALTA-2014 Shared Task
نویسندگان
چکیده
This paper describes the strategy and the results of a location mining system used for the ALTA-2014 shared task competition. The task required the participants to identify the location mentions in 1003 Twitter test messages given a separate annotated training set of 2000 messages. We present an architecture that uses a basic named entity recognizer in conjunction with various rule-based modules and knowledge infusion to achieve an average F score of 0.747 which won the second place in the competition. We used the pre-trained Stanford NER which gives us an F score of 0.532 and used an ensemble of other techniques to reach the 0.747 value. The other major source of location resolver was the DBpedia location list which was used to identify a large percentage of locations with an individual F-score of 0.935.
منابع مشابه
Overview of the 2014 ALTA Shared Task: Identifying Expressions of Locations in Tweets
This year was the fifth in the ALTA series of shared tasks. The topic of the 2014 ALTA shared task was to identify location information in tweets. As in past competitions, we used Kaggle in Class as the framework for submission, evaluation and communication with the participants. In this paper we describe the details of the shared task, evaluation method, and results of the participating systems.
متن کاملLocation Mention Detection in Tweets and Microblogs
The automatic identification of location expressions in social media text is an actively researched task. We present a novel approach to detection mentions of locations in the texts of microblogs and social media. We propose an approach based on Noun Phrase extraction and n-gram based matching instead of the traditional methods using Named Entity Recognition (NER) or Conditional Random Fields (...
متن کاملMSR-NLP Entry in BioNLP Shared Task 2011
We describe the system from the Natural Language Processing group at Microsoft Research for the BioNLP 2011 Shared Task. The task focuses on event extraction, identifying structured and potentially nested events from unannotated text. Our approach follows a pipeline, first decorating text with syntactic information, then identifying the trigger words of complex events, and finally identifying t...
متن کاملIdentifying Twitter Location Mentions
This paper describes our system in the ALTA shared task 2014. The task is to identify location mentions in Twitter messages, such as place names and point-ofinterests (POIs). We formulated the task as a sequential labelling problem, and explored various features on top of a conditional random field (CRF) classifier. The system achieved 0.726 mean-F measure on the held-out evaluation data. We di...
متن کاملNLP CEN AMRITA @ SMM4H: Health Care Text Classification through Class Embeddings
Artificial Intelligence has been a major breakthrough in many domains. Now, it has started automating health care domain through Natural Language Processing and Computer Vision applications. As a part of it, researchers are now focusing more on mining health related information from the text shared through social media and clinical trials. This paper explains about our system for health care te...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014